Adapting the Core Language Engine to French and Spanish 



Manny Rayner and David Carter 

SRI International 
Suite 23, Millers Yard 
Cambridge CB2 IRQ 
United Kingdom 
{manny, dmc}@cam. sri . com 



Pierrette Bouillon 

ISSCO, University of Geneva 
54, route des Acacias 
1227 Geneva 
Switzerland 
Pierrette . Bouillon@issco . unige . ch 



Abstract 

We describe how substantial domain- 
independent language-processing systems 
for French and Spanish were quickly de- 
veloped by manually adapting an exist- 
ing English-language system, the SRI Core 
Language Engine. We explain the adapta- 
tion process in detail, and argue that it pro- 
vides a fairly general recipe for converting 
a grammar-based system for English into a 
corresponding one for a Romance language. 

1 Introduction 

In this paper, we will describe how substan- 
tial domain-independent language-processing sys- 
tems for French and Spanish were quickly devel- 
oped by manually adapting an existing English sys- 
tem, the SRI Core Language Engine. The resulting 
systems have been integrated as components in the 



Spoken Language Translator (SLT; (Rayner et al 
1993a| ; |Agnas et al., 1994|)). The English to F rench 



version of SLT (Rayner and Bouillon, 1995) is of 



a standard comparable to the original English to 
Swedish version. 

The syntactic rule-set for French covers nearly all 
the basic constructions of the language, including 
the following: declarative, interrogative and imper- 
ative clauses; formation of YN and WH-questions 
using inversion, complex inversion and "est-ce que" ; 
clitic pronouns; adverbial modification; negation; 
nominal and verbal PPs; complements to "etre" 
and "il y a" ; relative clauses, including those with 
"dont" ; partitives, including use of "en"; passives; 
pre- and post-nominal adjectival modification, in- 
cluding comparative and superlative; code expres- 
sions; sentential complements and embedded ques- 
tions; complex determiners; numerical expressions; 
date and time expressions; conjunction of most ma- 
jor constituents; and a wide variety of verb types, 



including modals and reflexives. There is a good 
treatment of inflectional morphology which includes 
all major paradigms. The coverage of the Spanish 
grammar is comparable in scope, though slightly less 
extensive. The French and Spanish versions of the 
CLE are both "reversible", and can be used for ei- 
ther analysis or generation. 

We will describe the adaptation process in detail, 
and argue that it provides a fairly general recipe for 
converting a grammar-based system for English into 
a corresponding one for a Romance language. Due 
to space limitations, and since it is rather the better 
of the two, we will concentrate on the French version. 
Examples will be taken from the Air Travel Planning 
(ATIS) domain used in the current SLT prototype. 

The rest of the paper is organized as follows. Sec- 
tion U gives an overview of the CLE, focussing on the 
aspects relevant to this paper. Section describes 
the French morphology rules. Sections ^[and || de- 
scribe the French and Spanish grammars. Section ^ 
concludes. 

2 Overview of the Core Language 
Engine 

The CLE is a general language-processing system, 
which has been developed by SRI International in 
a series of projects starting in 1986. The orig inal 
system was for English only. A Swedish version ( Al- 
Ishawi (ed), 1991 , §14.2) was developed in a collabo- 



ration with the Swedish Institute of Computer Sci- 
ence; the French and Spanish versions described here 
were developed in collaborations with ISSCO and 
the University of Seville respec tively. The CLE is 
extensively described e lsewhere (Alshawi (ed), 1992] ; 



Alshawi et al., 1992; Agnas et al., 1994), so this 



section will only give the minimum background nec- 
essary to understand the remainder of the paper. 

The basic functionality offered by the CLE is two- 
way translation between surface form and a repre- 
sentation in terms of a logic-based formalism called 



QLF ( |Alshawi and Crouch 1992| ). The modules com- 
prising a version of the CLE for a given language can 
be divided into three groups, which we refer to as 
"code" , "rules" and "preferences" . The "code" mod- 
ules constitute the language-independent compilers 
and interpreters that make up the basic processing 
engine; the other two types of module between them 
constitute a declarative description of the language. 
The "rules" contain domain-independent lcxico- 



Syntax: Syntax rules are written in a unification- 
based feature-grammar formalism. The style 
of the CLE grammar is loosely based on 
GPSG; detailed descriptions of the CLE gram- 
mar for English are available in (Pulman, 1992) 
and ([Rayner, 1994|). 



Semantics: 

The CLE grammar is "sign-based" (van Eijck 



grammatical information for the language in qucs- 



and Moore, 1992 ); each syntax rule is coupled 



tion; they encode a relationship between surface 
strings and QLF representations. Thus for any given 
surface string, the rules define a set of possible QLF 
representations of that string. Conversely, given a 
well-formed QLF representation, the rules can be 
used to produce a set of possible surface-form re- 
alisations of the QLF. The code modules support 
compilation of the rules into forms that allow fast 
processing in both directions: surface-form — > QLF 
(analysis) and QLF — > surface- form (generation). 

The relationship between surface form and QLF 
is in general many-to-many. "Preference" modules 
contain data in the form of statistically learned dis- 
tributional facts, based on analysis of domain cor- 



pora (Alshawi and Carter, 1994; Rayner and Bouil- 
1 9 9 5| ) ) . Using this extra information, the sys- 



Lon 



tern can distinguish between plausible and implausi- 
ble applications of the rules with a fairly high degree 
of accuracy. In particular, the preference informa- 
tion makes it possible in a given application domain 
to select the intended readings of ambiguous utter- 
ances. We will not consider the preference modules 
further here, as the statistical training procedures 
are completely language-independent and invisible 
to the developer. 

The non-trivial problems involved in adapting the 
CLE to a new language arise in connection with the 
"rule" modules, which we will now consider in more 
detail. These fall into the following main categories: 

Lexicon: The largest single set of rules is that 
which comprises the lexicon. This is in fact di- 
vided up into three subsets: a function-word 
lexicon; a set of macros specifying generic lex- 
ical entries for common types of content word 
(e.g. "count noun", "intransitive verb", etc.); 
and a content word lexicon which defines lexical 
entries for other words in terms of the generic 
macros. 

Morphology: This set of rules defines the inflec- 
tional morphology of the language, allowing 
analysis of words in terms of stems and inflec- 
tions. The morphology rules are described fur- 
ther in Section ||[ 



with one or more semantic counterpart, which 
defines the piece of QLF form produced by that 
rule. QLF representations are built up compo- 
sitionally using unification only. 

Reference resolution and scoping: Further sets 
of rules can be used to convert QLF represen- 
tations into representations in full first-order 
logic. This phase of processing is required, 
for example, when using the CLE for database 
query applicat ions ( Alshawi and Crouch 199% 



Rayner, 1993|) 



As noted in ( [Alshawi (ed), 1992] , §14.2.2) the effort 
involved in adapting a set of rule modules to a new 
language depends on how directly they refer to sur- 
face form; unsurprisingly, modules defining surface 
phenomena are the ones which require most work. 
When adapting the system to French and Spanish, 
the problems arose almost exclusively in connection 
with morphology and syntax rules. Other parts of 
the English system were adapted with little effort. In 
particular, the semantic rule-sets for English could 
be used for the new languages with only minimal 
changes. 

The following two sections describe in detail the 
issues pertaining to the morphology and syntax rule- 
sets respectively. 

3 Morphology and spelling 



In order to handle the more complex inflectional 
morphology of Romance and other European lan- 
guages, a morphological processor based on feature- 
augmented tw o-level morphology was developed 
( Carter, 1995 ). This allows the complex spelling 
changes occurring in these languages to be handled 
quickly in both analysis and generation. Compi- 
lation of the full sets of two-level rules describing 
spelling changes and of production rules describ- 
ing legal affix combinations takes of the order of a 
minute, allowing changes to the rules to be debugged 
relatively easily. Further flexibility is gained by not 
requiring the lexicon to be present at compile time 
(contrast ( Kaplan and Kay, 1994 )); thus the lexicon 



can be incremented and tested without any recompi- 
lation being required. Two-level spelling rules were 
also used to describe the inter-word effects that are 
particularly common in French. 

The total number of rules required to describe in- 
flectional morphology was around 75 for French and 
50 for Spanish (inter-word rules being responsible 
for much of the difference). We concentrate here on 
the French phenomena, which are more complex. 

3.1 Intra- word spelling changes 

Intra-word spelling changes for French present sev- 
eral problems not encountered in English inflectional 
morphology. Some of these are technical in nature, 
and easily dealt with. In particular, French exhibits 
many multiple letter changes, e.g. "chameau+e" — > 
chamelle, "peign+rai" — * peindrai. For reasons ex- 



plained in (Carter, 1995), these must be handled by 
a separate rule for each letter that changes, rather 
than one for the whole changed substring. Also, 
some changes can be optional. For example, the 
"y" in verbs such as "payer" can remain the same or 
change to "i" before silent "e": "pay+e" — > either 
paye or paie. This phenomenon is rare or absent in 
English, but is handled easily by making the relevant 
spelling rule optional. 

Less trivial problems, however, arise from the fact 
that spelling changes in French generally cannot be 
predicted from the surface form of the word alone. 
This means the application of the rules must be con- 
trolled; we do this by specifying feature constraints, 
which must match between the rule and all mor- 
phemes it applies to. The following extended ex- 
ample describes our treatment of one of the most 
challenging cases. 

Nouns, adjectives and verbs ending in "-et" or "- 
el" can either double the "t" or "1" before a silent 
"e" or change the prefinal "e" to "e" : "cadet+e" 
— > cadette, but "complet+e" — > complete. The ap- 
plication of the spelling rules is therefore controlled 
by means of a feature spelling_type, with value 
double in the first case and change_e_e in the sec- 
ond. 

This situation is further complicated by two 
facts. Firstly, the surface "el" or "et" of 
the verbs is ambiguous between a deep "el" 
or "et" , and "el" or "et" . For example, 
we have achete <— "achet+e" , but affrete *— 
"affret+e". For this reason, we introduce a third 
value for spelling_type: change_e_e. "Affret" 
has thus the feature spelling_type=change_e_e, 
"achet" spelling_type=change_e_e and "appel" 
spelling_type=double. 

Secondly, the "e" that begins future and con- 



ditional endings sometimes affects preceding let- 
ters as if it were silent, and sometimes as if it 
were not. For example, "appel+erai" — > appellerai, 
doubling the "1" just as in "appel+e" — > appelle, 
where the final "e" actually is silent. However, 
"ced+erai" — > cederai, not *cederai as would be 
expected from the silent-e behaviour "ced+e" — > 
cede. To make this distinction, we use a feature 
muet ( "silent" ) for specifying if the "e" in the suf- 
fix is silent, as "e" (muet=y), not silent, as "ez" 
(muet=n) or the "e" of the future or conditional 
tenses, for example "erai/erais" (muet=f ut_cond_e). 
Then, we restrict the rule for doubling the con- 
sonant with the features spelling_type=double , 
muet=yVf ut_cond_e, and the one for "e" — ► "e" with 
the features spelling_type=change_e_e ,muet=y. 

3.2 Inter-word spelling changes 

In English, inter-word spelling changes occur only in 
the alternation between "a" and "an" before conso- 
nant and vowel sounds respectively. In French, such 
changes are far more widespread and can be com- 
plex. However, they can be handled by judiciously 
specifying contexts in two-level rules and, in a few 
cases, by postulating non-obvious underlying lexical 
items. Some important cases are: 



• The "e" in the function words "de" , "je" , 



"le", 

"que", "se" and "te" is elided be- 
fore (most) words starting in a vowel sound, 
except when the function word follows a hy- 
phen: "le homme" — > Vhomme, "je ai" — > j'ai, 
but "puis-je avoir" does not elide, so the eli- 
sion rule specifies that the hyphen be absent 
from the context. "Ce" also elides when used 
as a pronoun ("ce est" — > e'est, but when used 
as a determiner it takes the form "cet" be- 
fore a vowel: cet homme. We therefore take 
the underlying form of the determiner to be 
"cet", which loses its "t" when followed by a 
consonant-initial word ("cet soir" — > ce soir). 

Numerals do not allow elision either: "le onze" 
does not become * I 'onze. We therefore treat the 
lexical form as being "#onze" , where acts 
as an underlying consonant but is realised as 
a null. (Syntax plays a role here too: "le un" 
— > Vun when is a determiner, but not when it 
is a numeral. Thus lexically we have "un" as 
determiner and "#un" as numeral). 

The very common preposition/article combina- 
tions "de"/"a" and "le"/"les": "de le" -> du, 
"a les" — > aux, etc. These contractions span 
constituent boundaries (we view du vol as be- 
ing syntactically [PP de [NP le vol]]) so need 



to be treated as spelling effects. Also, vowel 
elision takes precedence: "de le homme" — > de 
I'homme, not *du homme. 

• Hyphens between verbs and clitic pronouns are 
treated as lexical items in our grammar. They 
are realised as -t- when preceded by "a" or "e" 
and followed by "e" , "i" or "o" : "va - il" — > va-t- 
il, but "vont - ils" — > vont-ils. Hyphens joining 
nouns or names are treated as different lexical 
items not subject to this change: "les vols At- 
lanta - Indianapolis" does not involve introduc- 
tion of "t" . 

4 French syntax 

When comparing the French and English grammars, 
there are two types of objects of immediate interest: 
syntax rules and features. Looking first at the rules 
themselves, about 80% of the French syntax rules are 
either identical with or very similar to the English 
counterparts from which they have been adapted. 
Of the remainder, some rules (e.g those for date, 
time and number expressions) are different, but es- 
sentially too trivial to be worth describing in detail. 
Similar considerations apply to features. 

We will concentrate our exposition on the rules 
and features which are both significantly different, 
and possess non-trivial internal structure. Examin- 
ing the grammar, we find that there are three large 
interesting groups of rules and features, describing 
three separate complexes of linguistic phenomena: 
question-formation, clitic pro nouns and agreement 



As w e have argued previously ( Rayner and Bouillon 



1995 ), all of these are rigid and well-defined types of 
construction which occur in all genres of written and 
spoken French. It is thus both desirable and reason- 
able to attempt to encode them in terms of feature- 
based rules, rather than (for instance) expecting to 
derive them as statistical regularities in large cor- 
pora. In Sections 4.5, 4.1 and [O, we describe how 



we handle these key problems. 
4.1 Question- formation 

We start this section by briefly reviewing the way in 
which question-formation is handled in the English 
CLE grammar. There are two main dimensions of 
classification: questions can be either WH- or Y-N; 
and they can use either the inverted or the unin- 
verted word-order. Y-N questions must use the in- 
verted word-order, but both word-orders are permis- 
sible for WH-questions. The phrase-structure rules 
analyse an inverted WH-question as constituting a 
fronted WH+ element followed by an inverted clause 
containing a gap element. The feature inv distin- 



guishes inverted from uninverted clauses. The fol- 
lowing examples illustrate the top-level structure of 
Y-N, unmoved WH- and moved WH-questions re- 
spectively. 

[Does he love Mary] S: [ i7W=!/ ] 

[Who loves Mary] s . 

[[Whom] at p [does he love [}Np]s:linv= v ]} 

The French rules for question formation are struc- 
turally fairly similar to the English ones. However, 
there are several crucial differences which mean that 
the constructions in the two languages often differ 
widely at the level of surface form. Two phenomena 
in particular stand out. Firstly, English only permits 
subject-verb inversion when the verb is an auxiliary, 
or a form of "have" or "be"; in contrast, French 
potentially allows subject-verb inversion with any 
verb. For this reason, English question-formation 
using auxiliary "do" lacks a corresponding construc- 
tion in French. 

Secondly, French permits two other common 
question-formation constructions in addition to 
subject- verb inversion: prefacing the declarative ver- 
sion of the clause with the question particle "est-ce 
que", and "complex inversion", i.e. fronting the sub- 
ject and inserting a dummy pronoun after the in- 
verted verb. In certain circumstances, primarily if 
the subject is the pronoun "ga", it is also possible 
to form a non-subject WH-question out of a fronted 
WH+ phrase followed by an uninverted clause con- 
taining an appropriate gap. We refer to this last 
possibility as "pseudo-inversion" . 

If the subject is a pronoun, only inversion and 
the "est-ce que" construction are allowed; if it is 
not a pronoun, only the "est-ce que" construction 
and complex inversion are valid. In addition, a sub- 
ject pronoun following an inverted verb needs to be 
linked to it by a hyphen, which can be realised as 
a "-t-" (cf. Section |^) . Figure [l] presents examples 
illustrating the main French question constructions. 

Modification of the English syntax rules to cap- 
ture the basic requirements so far is quite sim- 
ple. In our grammar, we added three extra rules 
to cover the "est-ce que" , complex-inversion and 
pseudo- inversion constructions: the second of these 
rules combines the complex-inverted verb with the 
following dummy pronoun to form a verb, in essence 
treating the dummy pronoun as a kind of verbal af- 
fix. A further rule deals with the hyphen linking an 
inverted verb with a following subject. 

With regard to the feature-set, the critical change 
involves the inv feature. In English, as we saw, 



Y-N, inversion: 
Aime-t-il Marie? 

Y-N, "est-ce que": 

Est-ce que Jean aime Marie? 

Y-N, complex inversion: 
Jean aime-t-il Marie? 

WH, subject question, no inversion: 
Quel homme aime Marie? 

WH, inversion: 

Quelle fcmmc aime-t-il? 

WH, "est-ce que": 

Quelle femme est-ce que Jean aime? 

WH, complex inversion: 
Quelle femme Jean aime-t-il? 

WH, pseudo-inversion: 
Combien ga coute? 



Figure 1: Main French question constructions 



this feature had two possible values, y and n. In 
French, the corresponding feature has five values: 
inverted, uninverted, est_ce_que, complex and 
pseudo, distinguishing clauses formed using the dif- 
ferent question- formation constructions. (It is im- 
portant to note, though, that the semantic repre- 
sentation of the clause is the same irrespective of 
its inversion- type). To enforce the restrictions con- 
cerning combinations of inversion-type and subject 
form, we also added a new clausal feature which dis- 
tinguished clauses in which the subject is a pronoun. 

The attractive aspect of this treatment is that the 
remaining English rules used for question-formation 
can be retained more or less unchanged. In particu- 
lar, the English semantic rules can still be used, and 
produce QLF representations with similar form. 

It would almost be true to claim that the above 
constituted our entire treatment of French question- 
formation. In practice, we have found it desirable 
to add a few more features to the grammar in or- 
der to block infelicitous combinations of the inver- 
sion rules with certain commonly occurring lexical 
items. It is possible that the effect of these features 
could be achieved equally well by statistical mod- 
elling or other means, but we describe them here for 
completeness: 



Restrictions on use of "est-ce que": Question- 
formation with "est-ce que" is strongly dispre- 
ferred when the main verb is a clause-final oc- 
currence of "etre", or existential "avoir" (as in 
"il y a" ) . For example: 

?Quand est-ce que le prochain vol est? 
?Combien de vols est-ce qu'il y a? 

We enforce this by adding a suitable feature to 
the verb category. 

Fronting of "heavy" NPs: Most languages pre- 
fer not to front "heavy" NPs, and this dispref- 
erence is particularly strong in French. We have 
consequently added an NP feature called heavy, 
which has the value y on NPs containing PP 
and VP post-modifiers. Thus for example gen- 
eration of 

Quels vols en partance de Dallas y a- 
t-il? 

is blocked, but the preferable 

Quels vols y a-t-il en partance de Dal- 
las? 

is permitted. 

Inverted subject NPs: Occurrence of some pro- 
nouns (in particular "cela" , and "ga" ) is 
strongly dispreferred in inverted subject posi- 
tion. A binary feature enforces this as a rule, 
for example blocking 

Combien coute ga pour aller a Boston? 

but instead permitting 

Combien ga coute pour aller a Boston? 

4.2 Clitics 

The most difficult technical problems in adapting 
an English grammar to a Romance language are un- 
doubtedly caused by clitic pronouns. In contrast to 
English, certain proform complements of verbs do 
not appear in their normal positions; instead, they 
occur adjacent to the main verb, and possibly joined 
to it by a hyphen. The position of the clitics in rela- 
tion to the verb (pre- or post-verbal) is determined 
by the mood of the verb, and whether or not the 
verb is negated. If two or more clitics are affixed 
to the verb, their internal order is determined by 
their surface forms. Several attempts to account for 
the above and other data have previously been de- 



scribed in the liter a ture e.g. ( Grimshaw 1982 : Bei 



and Gardent 1989; Estival, 199C; Miller and Sag 



1995); we have in particular been influenced by the 



last of these,. 



Although the underlying framework is very dif- 
ferent from the HPSG formalism used by Miller and 
Sag, our basic idea is the same: to treat "clitic move- 
ment" by a mechanism similar to the one used to 
handle WH movement. More specifically, we intro- 
duce two sets of new rules. The first set handles the 
"surface" clitics. They define the structure of the 
verb/clitic complex, which we, like Estival, regard 
as a constituent of category V composed of a main 
verb and a "clitic-list" . A second set of "gap" rules 
defines empty constituents of category NP or PP, 
occurring at the notional "deep" positions occupied 
by the clitics. Thus, for example, on our account the 
constituent structure of "Est-ce que vous le voulez?" 
will be 

[Est-cc que [vousjvp [le voulezjy []ivp]s]s 

where the "gap" NP category represents the notional 
direct object of "voulez" , realised at surface level by 
the pre- verbal clitic "le" . 

To make this work, we add an extra feature, 
clitics, to all categories which can participate in 
clitic movement: in our grammar, these are V, VP, 
S, NP and PP. The clitics feature is used to link 
the cliticised V constituent and its associated clitic 
gap or gaps. We have found it convenient to define 
the value of the clitics feature to be a bundle of 
five separate sub-features, one for each of the five 
possible clitic-positions in French. Thus for instance 
the second-position clitics "le" , "la" and "les" are re- 
lated to object-position clitic gaps through the sec- 
ond sub-feature of clitics; the fourth-position "y" 
clitic is related to its matching PP gap through the 
fourth sub-feature; and so on. The linking relation 
between a clitic-gap and its associated clitic is for- 
mally exactly the same as that obtaining between 
a WH-gap and its associated antecedent, and can if 
desired be conceptualized as a type of coindcxing. 

The clitics feature-bundle is threaded through 
the grammar rule which defines the structure of the 
list of clitics associated with a cliticised verb, and 
enforces the constraints on ordering of surface clitics. 
These constraints are encoded in the lexical entry for 
each clitic. 

This basic framework is fairly straight-forward, 
though a number of additional features need to be 
added in order to capture the syntactic facts. We 
summarize the main points: 

Position of surface clitics: Clitics occur post- 
verbally in positive imperative clauses, other- 
wise pre-verbally. The clitic-list constituent 
consequently needs to share suitable features 
with the verb it combines with. 



Surface form of clitics: The first- and second- 
person singular clitics are realised differently 
depending on whether they occur pre- or post- 
verbally: for example "Vous me reservez un vol" 
versus "Reservez-moi un vol" . Moreover, "me" 
and "te" are first-position clitics (e.g. "Vous me 
les donnez"), while "moi" and "toi" are third- 
position ( "Donnez-les-moi" ) . This alternation 
is achieved simply by having separate lexical en- 
tries for each form. The entries have different 
syntactic features, but a common semantic rep- 
resentation. 

Special problems with the "en" clitic: 

The most abstruse problems occur in connec- 
tion with the "en" clitic, and are motivated by 
sentences like 

Combien en avez-vous? 

Here, our framework seems to dictate a con- 
stituent structure including three gaps, viz: 

[Combien [[en avez]y [vous at ,p [[]y 
[[]np [}pp]np]}s]s}s 

in which the V gap links to "avez" , the NP gap 
to "combien", and the PP gap to "en". The 
specific difficulty here is that the "en" PP gap 
ends up as an NP modifier (it modifies the NP 
gap) . Normally, however, PP modifiers of NPs 
cannot be gaps, and the above type of construc- 
tion is the only exception we have found. 

Rather than relax the very common NP — > NP 
PP rule to permit a gap PP daughter, we intro- 
duce a second rule of this type which specif- 
ically combines certain NPs, including suit- 
able gaps resulting from WH- movement, and 
an "en" clitic gap. A feature, takespartative, 
picks out the NPs which can participate as left 
daughters in this rule. 

4.3 Agreement 

Although grammatical agreement is a linguistic phe- 
nomenon that plays a considerably larger role in 
French than in English, the adjustments needed to 
the lexicon and syntax rules are usually obvious. 
For instance, a feature has to be added to the both 
daughters of the rule for pre-nominal adjectival mod- 
ification, to enforce agreement in number and gen- 
der. In nearly all cases, this same procedure is used. 
A feature called agr is added to the relevant cat- 
egories, whose value is a bundle representing the 
category's person, number and gender, and the agr 
feature is shared between the categories which are 
required to agree. 



There are however some instances where agree- 
ment is less trivial. For example, the subject and 
nominal predicate complement of "etre" may occa- 
sionally fail to agree in gender, e.g. 

La gare est le plus grand batiment de la 
ville. 

However, if the predicate complement is a pronoun 
("lequel", "celui-ci", "quel"]]}..) agreement in both 
gender and number is obligatory: thus for instance 

Quel/* quelle/* quels est le premier vol. 

It would be most unpleasant to duplicate the syn- 
tax rules, with separate versions for the pronominal 
and non-pronominal cases. Instead, we add a sec- 
ond agreement feature (compagr) to the NP cate- 
gory, which is constrained to have the same value 
as agr on pronominal NPs; subject /predicate agree- 
ment can then use the compagr feature on the pred- 
icate, getting the desired behaviour. 

Similar considerations apply to the rule allowing 
modification of a NP by a "de" PP. In general, there 
is no requirement on agreement between the head 
NP and the NP daughter of the PP. However, for 
certain pronominal NP ( "lequel" , "Pun" , "chacun" ) 
gender agreement is obligatory, e.g. 

lequel/ *laquelle de ces vols 
laquelle/* lequel de ces dates 

This is dealt with correspondingly, by addition of a 
new agreement feature specific to the NP — * NP PP 
rule. 

5 Spanish syntax 

This section briefly describes the interesting features 
of the Spanish syntactic rule-set. In general, the 
Spanish rules were distinctly simpler than the French 
ones. With a few exceptions noted below (in partic- 
ular, prodrop), the current Spanish syntax rules are 
essentially a slightly modified subset of the French 
ones. Despite this, they give very adequate coverage 
of the ATIS domain, the only in which they have so 
far been seriously tested. In a little more detail: 

Question-formation: The Spanish rules 
for question-formation are similar to, but less 
elaborate than the French ones. Subject-verb 
inversion is allowed with any subject; there is 



no restriction that it be pronominal. There are 
no constructions corresponding to "est-ce que" 
or complex inversion. When the inverted sub- 
ject is a pronoun, it does not require a preceding 
hyphen linking it to the verb. 

Clitics: The Spanish clitic system is also consider- 
ably simpler than the French one. There are 
fewer clitics; in particular, there are no cli- 
tics corresponding to the French "y" and "en" , 



which as we saw in Section 4.2 above gave rise 



to many of the difficult problems in French. 

Postverbal clitics are affixed directly to the 
verb, rather than being joined by hyphens. 
Since CLE morphotax rul es have a uniform for- 
mat (|Alshawi (ed), 1992] , §3.9), this only in- 



Most French grammars regard "quel" as an adjec- 
tive, but for semantic reasons we have found it more 
convenient to treat it as a pronoun in this type of con- 
struction and as a determiner in expression like "quel 
vol". 



volved moving the relevant syntax rules to the 
morphology rule file. 

Phrasal rules: The rules for Spanish numbers, 
dates and times are substantially different from 
the French ones, and those for dates in par- 
ticular needed to be rewritten more or less 
from scratch. The issues involved are however 
straight-forward. 

Also, the form of the Spanish superlative ad- 
jective is slightly different: the postnominal su- 
perlative adjective has no extra article, e.g. "le 
vol le [plus cher] versus "la plaza [menos cara]". 
The necessary adjustments are again simple. 

Relative clauses: A less trivial difference involves 
relative clauses. In Spanish, the main verb of 
the relative clause must be in the subjunctive 
mood if it modifies an argument of a verb in 
the imperative mood. Thus for example 

Which is the first flight that serves a 
meal? 

— > Cual es el primer vuelo que sirve 
una comida? 

("sirve" = present indicative), but 

Show me flights that serve a meal! 
— > Enseneme los vuelos que sirva una 
comida 

("sirva" = present subjunctive). Handling this 
alternation correctly involves trailing an extra 
feature through many grammar rules, so as to 
link the main verb in the relative clause to the 
main verb in the clause immediately above it. 

Prodrop: The second substantial change required 
when adapting the French grammar to Span- 
ish was necessitated by the prodrop rule: Span- 



ish, unlike French, permits and indeed encour- 
ages omission of the subject when it is a pro- 
noun. Perhaps surprisingly, prodrop in fact 
only resulted in a few divergences between the 
Spanish and French grammars. A new syn- 
tax rule of the form S — > VP was added (it 
is in fact a slightly modified version of the 
French imperative- formation rule). The associ- 
ated semantic rule fills in a representation of the 
omitted clausal subject from the main verb; to 
make this possible, the semantic entries for in- 
flected verbs are all modified to contain an extra 
feature encoding the possible prodrop subject. 
The details are straight-forward. 

6 Conclusions 

The preceding sections describe in essence all the 
changes we needed to make in order to adapt a 
substantial English language processing system to 
French and Spanish. Due to space limitations, we 
have been obliged to present some of the details in 
a more compressed form than we would ideally have 
wished, but nothing important has been omitted. 
Creation of a good initial French version required 
about five person- months of effort; after this, the 
Spanish version took only about two person-months. 
We do not believe that we were greatly aided by any 
special features of the Core Language Engine, other 
than the fact that it is a well-engineered piece of 
software based on sound linguistic ideas. Our overall 
conclusion is that an English-language system con- 
forming to these basic design principles should in 
general be fairly easy to port to Romance languages. 

Acknowledgements 

The work described here was supported by SRI In- 
ternational, Suissetra, and Telia Research AB, Swe- 
den. We would like to acknowledge the assistance 
provided by Gabriela Fernandez of the University 
of Seville in developing the Spanish version of the 
system, and thank Sabine Lehmann, David Milward 
and Steve Pulman for helpful comments. 



References 

[Agnas et al.1994] Agnas, M-S., Alshawi, H., Bretan, 
I., Carter, D.M., Ceder, K., Collins, M., Crouch, 
R., Digalakis, V., Ekholm, B., Gamback, B., Kaja, 
J., Karlgren, J., Lyberg, B., Price, P., Pulman, 
S., Rayner, M., Samuelsson, C. and Svensson, T. 



1994. Spoken Language Translator: First Year 
Report. SRI technical report CRC-0430 

[Alshawi (ed)1992] Alshawi, H. (ed.) 1992. The Core 
Language Engine. MIT Press. 

[Alshawi et al.1992] Alshawi, H. 
Carter, D., Crouch, R., Pulman, S., Rayner, M. 
and Smith, A. 1992. CLARE: A Contextual Rea- 
soning and Cooperative Response Framework for 
the Core Language Engine SRI technical report 
CRC-028. 

[Alshawi and Carterl994] 

Alshawi, H., and Carter, D. 1994. Training and 
Scaling Preference Functions for Disambiguation. 
Computational Linguistics, 20:4. 

[Alshawi and Crouch 1992] Alshawi, H. and 
Crouch, R. 1992. "Monotonic Semantic Inter- 
pretation". Proceedings of 30th ACL. 

[Bes and Gardent 1989] Bes, G. and Gardent, C. 
1989. French Order without Order. Proceedings 
of 4th European ACL. 

[Carterl995] Carter, D. 1995. Rapid Development 
of Morphological Descriptions for Full Language 
Processing Systems. Proceedings of 7th European 
ACL. Also SRI Technical Report CRC-047 

[Estivall990] Estival, D. Generating French with a 
Reversible Unification Grammar. 1990. Proceed- 
ings of 13th COLING. 

[Grimshaw 1982] Grimshaw, J. 1982. On the Lexical 
Representation of Romance Reflexives. In J. Bres- 
nan (ed.), The Mental Representation of Gram- 
matical Relations. MIT Press. 

[Kaplan and Kay 1994] Kaplan, R., and Kay, M. 
1994. Regular Models of Phonological Rule Sys- 
tems. Computational Linguistics, 20:3, 331-378. 

[Miller and Sag 1995] Miller, P. and Sag, I. 1995. 
French Clitic Movement Without Clitics or Move- 
ment. CSLI Technical Report. 

[Pulmanl992] Pulman, S. 1992. Unification -Based 



Syntactic Analysis. In (Alshawi (ed), 1992) 



[Rayner 1993] Rayner, M. 1993. Abductive Equiv- 
alential Translation and its Application to Natu- 
ral Language Database Interfacing. PhD thesis, 
Royal Institute of Technology/Stockholm Univer- 
sity. Also SRI Technical Report CRC-052 

[Rayner 1994] R ayner, M. 1994. E nglish linguistic 



coverage. In (Agnas et al., 1994) 



2 A11 SRI Cambrid ge technical reports are a vailable 
through WWW from http://www.cam.sri.con 



[Rayner et al. 1993a] Rayner, M., Alshawi, H., Bre- 
tan, I., Carter, D.M., Digalakis, V., Gamback, B., 
Kaja, J., Karlgren, J., Lyberg, B., Price, P., Pul- 
man, S. and Samuelsson, C. 1993. A Speech to 
Speech Translation System Built From Standard 
Components. Proc. 1st ARPA workshop on Hu- 
man Language Technology. Also SRI Technical 
Report CRC-031. 

[Rayner and Bouillonl995] Rayner, M. and Bouil- 
lon, P. 1995. Hybrid Transfer in an English- 
French Spoken Language Translator. Proceedings 
of IA '95, Montpellier, France. Also SRI Technical 
Report CRC-056. 

[van Eijck and Moorel992] van 

Eijck, J. and Moore, R. 1992. Se mantic Rules 
for English. In (|Alshawi (ed), 1992|) 



